A Two Colorable Fourth Order Compact Difference 
Scheme and Parallel Iterative Solution of 
the 3D Convection Diffusion Equation 

Jun Zhang* and Lixin GF 
Department of Computer Science, University of Kentucky 
773 Anderson Hall, Lexington, KY 40506-0046, USA 

Jules Kouatchou* 

NASA Goddard Space Flight Center - Code 931 
Greenbelt, MD 20771, USA 

March 12, 2000 


Abstract 

A new fourth order compact difference scheme for the three dimensional convection diffu- 
sion equation with variable coefficients is presented. The novelty of this new difference scheme 
is that it only requires 15 grid points and that it can be decoupled with two colors. The 
entire computational grid can be updated in two parallel subsweeps with the Gauss-Seidel 
type iterative method. This is compared with the known 19 point fourth order compact differ- 
ence scheme w*hich requires four colors to decouple the computational grid. Numerical results, 
with multigrid methods implemented on a shared memory parallel computer, are presented to 
compare the 15 point and the 19 point fourth order compact schemes. 


Key words: 3D convection diffusion equation, fourth order compact difference schemes, multigrid 
method, parallel computation. 

1 Introduction 

The three dimensional (3D) convection diffusion equation with variable coefficients can be written 
as 

Mix + Uyy + u zz + p{x,y,z)u x + q(x,y,z)u„ + r(x,y,z)u. = f(x,y,z), (1) 
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for a specified forcing function / in a continuous domain ft in 3D space with suitable boundary 
conditions prescribed on 50, the boundary of 0. Here the coefficients p, g,r, the forcing function 
/> as well as the unknown function w, are assumed to be continuously differentiable and have the 
required partial derivatives on D, where O is a union of rectangular solids. 

Equation (1) is encountered most commonly in the modeling of transport processes, including 
heat transfer and fluid flows [22, 25], such as the groundwater pollution problems and reservoir 
displacement problems [2, 3]. It describes the convection and diffusion of various physical quan- 
tities, e.g., momentum, heat, material concentrations, etc. Traditional numerical discretization 
schemes for approximating convection diffusion equations usually employ centered differencing for 
the second order diffusion terms and some form of upwind differencing for the first order con- 
vection terms [23]. For convection dominated problems, basic iterative methods fail to converge 
when used to solve linear systems resulting from the standard central difference discretization. 
The computed solutions from the standard upwind difference scheme is only first order accurate. 
Very fine discretization has to be employed to compute approximate solution with high accuracy, 
which in turn requires enormous computational power for 3D problems. Thus, the use of high 
order discretization schemes is one way to obtain high accuracy solution with moderate computa- 
tional cost. A 19 point fourth order compact finite difference scheme for (1) has been published 
in [34], based on the truncated Taylor series expansions. Other fourth order compact schemes for 
the 3D elliptic partial differential equations can be found in [1, 11]. Alternative high accuracy 
discretization schemes for 2D convection diffusion problems have also been reported in [19, 20, 21]. 
A parallelizable multigrid method with the 19 point fourth order compact scheme using a four 
color decoupling of computational grid has been developed by Gupta and Zhang in [15]. 

In parallel calculations with a Gauss-Seidel type iterative method, a computational grid 
decoupled with four colors needs four parallel subsweeps to update the entire grid. If the standard 
second order central difference or upwind difference schemes are used, the computational grid 
can be decoupled with two colors and updated in two parallel subsweeps. For the 2D convection 
diffusion equation, it can be shown that a fourth order compact scheme needs the closest 9 grid 
points, for which the computational grids can be decoupled with minimum 4 colors. It would be 
advantageous to find a fourth order compact scheme that does not need more than two colors to 
decouple the computational grid and still offers computed solution with high accuracy. This does 
not seem to be possible for the 2D convection diffusion problems. 

The work of Gupta and Kouatchou [13] shows that it is possible to derive a fourth order 
compact difference scheme for the 3D Poisson equation that requires only 15 grid points in the 
approximation scheme. The current work is to derive a 15 point compact difference scheme for the 
3D convection diffusion equation with variable coefficients, to design a parallel multigrid solution 
method to solve the resulting sparse linear systems, and to compare its numerical performance 
with the existing 19 point compact scheme. 

This paper is organized as follows. In Section 2, we present the method for deriving the 15 
point compact difference scheme. In Section 3, we discuss the multigrid solution method. Section 4 
contains strategies for decoupling the computational grid to extract parallelism in a Gauss-Seidel 
iteration. Numerical results are presented in Section 5 to compare the solution accuracy and 
parallel efficiency of the 15 and 19 point compact difference schemes. A brief conclusion is given 
in Section 6. The stencil coefficients of the 15 point fourth order compact finite difference scheme 
are listed in Appendix A. 


2 


y 24 16 23 



Figure 1: The 27 point stencil of the 3D grid points in a reference cube. 


2 Description of Derivation Procedure 


The discretization is carried out on a uniform 3D grid with a uniform mesh size h. We use a local 
coordinate system where the unit cubic grids are labeled as in Figure 1. The approximate value 
of a function u(x,y, z) at an interior mesh point (i, j. k) is denoted by no* The approximate values 
of its 26 immediate neighboring mesh points are denoted by ?//, / = 1,2, 26, as in Figure 1. The 
discrete values of pi,Qhn and // for l = 0, 1, ...,26, are defined analogously. A 3D finite difference 
scheme is compact if it only involves at most the 26 nearest neighboring grid points (of the center 
point) in the approximation formula. For convenience we divide the grid [joints into three groups: 
A - {0,1,2,3,4,5,61, B = {7,8,9,10,11,12,13,14,15,16,18}, C = {19,20,21,22,23,24,25,26}, 
see Figure 1. Group .4 contains the essential grid points needed for a 3D finite difference scheme 
for (1). Well known examples are the standard central difference scheme and the upwind difference 
scheme. The 19 point compact scheme in [34] utilizes the grid points in groups A and B . The 15 
point compact scheme that we will derive later utilizes the grid points in groups A and C only. 

The approach that we take to develop high order compact difference schemes was advocated 
by Spotz and Carey [27, 28, 29]. It has been used with a symbolic computation procedure by 
Ge and Zhang [11] to derive high order compact difference schemes for the 3D linear elliptic 
partial differential equations with variable coefficients. The entire procedure for deriving high 
order compact schemes is straightforward and can be done step by step. The truncation errors 
of the lower order approximations are approximated to higher orders to yield a high order finite 
difference scheme for the initial approximation. A unique finite difference scheme is given by the 
symbolic computation package Maple. All that needs to be done is complex substitution and term 
collection processes, which are especially suitable for symbolic computation packages. 

For a sufficiently smooth solution u, its first and second order partial derivatives with respect 
to x at an interior grid point 0 can be approximated by 


du 
dx 
d 2 n 
Ox 2 


< 5 * 


h 2 d'u 
6 dx 3 


/r d 4 u 
12 dx 4 


h 4 d h u 
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(3) 


where 5 X and S'~ are the first and second order central difference operators with respect to r. Similar 
partial derivatives with respect to y and z can be approximated to 0(h e ) order analogously. 


3 






Different finite difference schemes can be derived by substituting the approximation formulas 
(2). (3), and their counterparts for the y and z variables, for the first and second order partial 
derivatives in (1) and dropping the reminders of appropriate order. As an example, we derive in 
the following some compact difference schemes up to the fourth order. For this purpose, the 0(h 4 ) 
and the higher order truncation error terms in (2) and (3) can be ignored. The substitution yields 


(Slu + 6yU + 8zu + p5 x u + qdyii + rS z u /) 

JjL(LJ*L 1 ° A 1 d 4 d 3 d 3 9 3 \ 

6 \2dx 4+ 2 dy 4 + 2 dz 4+ V dx 3 + Q dy 3 + r dz 3 ) 


u + 0(h 4 ) =0. 


( 4 ) 


The standard 7 point second order central difference scheme is obtained by dropping all the 0(h 2 ) 
and the higher order terms in (4). To obtain a difference scheme with a higher order, the 0(h 2 ) 
terms in (4) cannot be dropped and has to be approximated further. Since the 0(h 2 ) terms have 
an /r factor, they can be approximated to the second order accuracy and still yield the fourth 
order accuracy for the whole approximation scheme. 

The key idea for increasing approximation accuracy is that the truncation errors pertaining 
to the discrete operator may be represented in the final discrete equation. For instance, in the 
case of the central difference operator for the first order derivative, the 0(ti 2 ) and the higher order 
truncation error terms can be represented using the original differential equation such that the 
order of accuracy is increased depending on how many terms are represented. To illustrate this 
idea, we differentiate (1) with respect to z, ?y, and z, in sequence, to obtain 
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for the third order partial derivatives, and repeat the process to obtain 
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( 5 ) 

(6) 
( 7 ) 


( 8 ) 

( 9 ) 


( 10 ) 


for the fourth order partial derivatives. If the right hand sides of these partial derivatives can be 
approximated to the 0(K 2 ) order, then the finite difference scheme (4) can be approximated to the 
0(h 4 ) order, per our previous discussion. 

The 15 point compact scheme can be derived by considering cross derivatives of the same 
order together and by utilizing their symmetry relation. This is different from the strategies used 
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for deriving the 19 point compact scheme, in which cross derivatives are approximated individually 

[U], 

Using the Taylor series expansion for the fourth order cross derivatives, we have 
d 4 u d 4 u d 4 u 1 r 
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■f 16 i/q ~ 4(m -f Ux 4- w-3 4- ?/.j 4 «5 4* i/g)]. (11) 


In addition, exploiting the relations among the third order cross derivatives, we get 
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Using Taylor series expansion again, the second order cross derivatives can be approximated as 
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With the same strategy, and jppp can be approximated using the grid points in groups ,4 and 
C only. Therefore, all the partial derivatives in (5) to (10) can be approximated to 0(/r) order 
using the first and second order central difference operators involving the grid points in groups .4 
and C. Now if we substitute the finite difference expressions of (5) - (10), using (11) - (15) and 
their counterparts, into (4), we will have a fourth order compact finite difference scheme for (1) 
defined at the grid points in groups ,4 and C. 

We used the Maple symbolic computation package for the extensive algebraic manipulations. 
The Maple code is similar to that in [11] used to derive the 19 point compact scheme. The 
computations were performed on an HP Exemplar supercomputer at the University of Kentucky 
Center for Computational Sciences. Appendix A lists the resulting formula of the 15 point fourth 
order compact difference scheme that can be used directly. The coefficients are scaled appropriately 
so that they have the same scale as those of the 19 point compact scheme of [34]. The scaling is 
done for the convenience of comparing truncation errors of the two difference schemes. 

Next, we compare the leading 0(h 4 ) terms in the truncation errors of the 15 and 19 point 
compact schemes. To simplify notations and still make meaningful comparison, we assume that 
the convection coefficients p,q and r are constant in ft. The leading truncation error terms were 
computed by our Maple code. 

The leading truncation error with the 0{h 4 ) term of the 19 point compact scheme is 
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1 / d^u d h u d b u d h u d 5 u 0 5 u \ 
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Similarly, the leading truncation error with the 0(h 4 ) term of the 15 point compact scheme is 


&15 — ^19 


1 / d 4 u d 4 u d 4 u 

12 dz 2 dydx + dzdy 2 dx + ^ dzdydx 2 


9 5 u d 6 it d^u d 6 u \ 

+ P dz 2 dy‘ 2 dx + ^ dz 2 dydx 2 T dzdy 2 dx 2 dz 2 dy 2 dx 2 ) 


We note that E\§ contains all terms of E\§, as well as a few extra terms involving the cross 
derivatives of all three variables. The most visible observation is that the largest coefficient factor 
for E \ 5 is 3 times larger than that for E 19 , if the magnitude of the convection coefficients is around 
1. It follows that, statistically, the errors of the 15 point compact scheme are a factor of 3 larger 
than the corresponding errors of the 19 point compact scheme. However, if the magnitude of the 
convection coefficients is large, the coefficient factor of E \ 5 behaves like 0(-p,Re 2 ), while that of 
E 19 behaves like 0(^-Re 2 ). Here, Re denotes the magnitude of the convection coefficients. Hence, 
the magnitude of the convection coefficients affects the accuracy of both schemes inversely. 


3 Multigrid Solution Method 

Each of the fourth order compact finite difference discretization schemes result in a system of linear 
equations of the form 

Au — 5, 

where A is the coefficient matrix, u is the solution vector (unknown), and b is the right hand 
side vector, which includes the forcing term and boundary condition information. Each row of A 
corresponding to an interior node away from the boundary contains 15 (or 19) nonzero entries for 
the 15 (or 19) point compact schemes. Those rows corresponding to the nodes next to the boundary 
contain fewer nonzero entries. In general, A has 15 (or 19) nonzero diagonals. The linear system 
has to be solved by some solution technique to yield a solution, i.e., u = A~ l b. For the current 3 D 
problems, the dimension of A is in general very large. Direct solution method based on Gaussian 
elimination is usually refrained from consideration due to the excessive requirements on computer 
memory and CPU time. Iterative techniques are viewed as a more viable means in solving 3 D 
problems. The major disadvantage of many iterative techniques is that their convergence may not 
be guaranteed for solving general sparse linear systems. Even if an iterative method converges for 
solving a given problem, its convergence rate is usually dependent on many factors, e.g., on the 
size of the linear system. In the current situation, the size of the linear system is reflected by the 
mesh size h. 

The coefficient matrix will be used many times in an iterative method. It is usually computed 
explicitly and stored before it is utilized. On average, each row of the coefficient matrix of the 
19 point compact scheme, j 4 i 9 , has 4 more nonzero entries than that of the 15 point compact 
scheme, A 19 . Then A^ 5 uses at least 21.05% less storage space. This may be an advantage in 3 D 
computations where computer memory is usually a major constraint. 

For the 19 point compact scheme, it is possible to show that the convergence of some basic 
iterative methods, such as Jacobi and Gauss-Seidel methods, is guaranteed if the cell Reynolds 
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number condition 


^nax ( ^ y? 2 )€Q {lc[ ? lri| ; |e|} 

2 h 

is satisfied. Under this condition. A may be weakly diagonally dominant, i.e., the inequality 
|tt 0 | > | at | holds for all rows of .4, using the stencil notation in (16). A rigorous convergence 

proof for the 2D analogous problem with constant coefficients is given in [33] and its generalization 
to the 3D problem is straightforward. However, even if this cell Reynolds number condition is 
violated, numerical experiments show that Gauss-Seidel method still converges, regardless of the 
magnitude of Ke h [31, 32, 34]. (A rigorous proof for the ID case can be found in [35].) 

It is well known that classical iterative (relaxation) methods converge slowly for solving 
large sparse linear systems. Many iterative methods have also been used to solved 2D convection 
diffusion equations discretized by other schemes [4, 6, 7, 12, 24, 26, 37]. Our choice of linear system 
solver is multigrid method which has been shown to be very effective for solving discretized elliptic 
problems [8, 30]. 

The rnultigrid method is based on the idea that classical relaxation methods such as Gauss- 
Seidel iteration strongly damp the oscillatory error components, but converge slowly for smooth 
error components. Hence, after a few relaxation sweeps, we compute the smooth residual and 
project it to a coarser grid on which the smooth error components become more oscillatory. Solving 
the residual equation on a coarse grid, interpolating the error correction back to the fine grid, and 
adding it to the current approximate solution give the two level method. The multigrid method 
exploits the idea that the residual equation on the coarse grid has a similar structure as the original 
problem on the fine grid and the basic idea of the two level method can be applied recursively. 
Therefore, on the coarse grid, relaxation sweeps are carried out and the smooth residual is projected 
to a coarser grid. This process may go down to a coarsest grid where a direct solver or several 
relaxation sweeps are employed to obtain a solution (both approaches are cheap because the size of 
the linear system on the coarsest grid is small ) . Then the corrections are interpolated back to finer 
grids until the process reaches the finest grid and the fine grid approximate solution is corrected. 
The procedure just described is a simple multigrid V cycle algorithm. A multigrid V {v \ , v-i) cycle 
algorithm is to do V\ relaxation sweeps on a given grid before going to a coarser grid and to do 
V'i relaxation sweeps after adding the coarse grid correction to the current approximation. For an 
introduction to the rnultigrid method and other multigrid cycling algorithms, see Briggs [5] and 
Wesseling [30]. 

For the 19 point compact scheme with constant convection coefficients, Fourier smoothing 
analysis was conducted in [15]. It shows that the Gauss-Seidel relaxation has a smoothing factor 
that is strictly less than 1, which indicates that the rnultigrid method with a Gauss-Seidel relax- 
ation will converge regardless of the magnitude of the cell Reynolds number (constant convection 
coefficients throughout the computational domain). 

Similar Fourier smoothing analysis may be conducted to show that the Gauss-Seidel relax- 
ation with the 15 point compact scheme also has a smoothing factor that is smaller than 1 for any 
constant convection coefficients. 

Multigrid techniques for solving 2D and 3D convection diffusion equations, discretized by the 
fourth order compact schemes, has been studied extensively recently [14, 15, 16, 18, 17, 31, 32, 36]. 
For more detailed description of the multigrid method with the 19 point compact scheme, see [15]. 
The 15 point compact scheme can be accommodated straightforwardly by modifying the relevant 
(relaxation and residual computation) parts in the existing multigrid method. 
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Figure 2: Decoupling the 3D grid points with four colors for Gauss-Seidel relaxation with the 19 
point compact scheme. 

4 Multicoloring Strategies for Parallelism 

It is well known that Jacobi iterative method can be fully parallelized. The drawback of the Jacobi 
method is that when it is used as a smoother in the multigrid method, it usually needs to be 
damped by a damping factor which is difficult to estimate for most practical problems. Even 
with a damping factor obtained by trial and error, the smoothing effect of the (damped) Jacobi 
relaxation is usually poor. 

The lexicographic Gauss-Seidel relaxation, which has a better smoothing effect than the 
Jacobi relaxation, is often used as the smoother in multigrid method. For parallelization and 
vectorization benefit, we may reorder the grid points by dividing them into several colored groups 
so that parallel relaxation sweeps can be carried out within each group. In the 2D case, four colors 
are needed to decouple a 9 point compact scheme. In the 3D case with our 19 point compact 
scheme, four colors are sufficient to completely decouple the 3D grid points [15]. For simplicity, we 
assume that (R)ed, (B)lack, (G)reen and (O)range colors are used. For a grid point with a given 
color, it is necessary that the nearest grid points along the three coordinate directions are marked 
with different colors. Figure 2 depicts a reference grid point colored with red and its 18 nearest 
neighboring grid points are colored with black, green and orange. Note that updating a red point 
needs the values of 2 nearest and 4 next nearest grid points marked w r ith each of the other three 
colors. For the 19 point compact discretization scheme, we noted above that if the grid is colored 
by four colors, all grid points with each color can be updated simultaneously on parallel computers 
and four subsweeps can be carried out to perform a Gauss-Seidel relaxation over the whole grid. 
This approach is referred to as four color Gauss-Seidel relaxation. 

It is interesting that a 3D computational grid with the 15 point compact difference scheme can 
be decoupled with only two colors [13]. This is shown in Figure 3. It can be seen that a reference 
R(ed) point is linked to 14 other B(lack) points with the 15 point compact scheme. Hence, two 
color red-black Gauss-Seidel relaxation with the 15 point compact scheme can update the entire 
computational grid in two parallel subsweeps, while the four color Gauss-Seidel relaxation needs 
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Figure 3: Decoupling the 3D grid points with t,w< 
point compact scheme. 
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four parallel subsweeps with the 19 point compact scheme. This difference could be utilized for 
the advantage of the 15 point compact scheme on parallel computers. 

The colored Gauss-Seidel relaxation leads to highly parallelizable solvers. Parallelization is 
obtained since the grid points with each color are decoupled and all the equations of a single 
color can be computed independently of the other colors. The computations are performed in a 
number of parallel operations equal to the number of independent colors. In addition to the gains 
in parallelization, practical experience showed that better multigrid convergence and smoothing 
properties are usually obtained with multiple color ordering. 


5 Numerical Validation 

Two test problems are chosen to numerically validate the new 15 point compact scheme and the 
parallel multigrid solution method. We compare its performance with that of the 19 point compact 
scheme. For our experiments, we expect the 15 point compact scheme to be slightly less accurate 
than the 19 point compact scheme but to cost less per multigrid cycle. 

The multigrid method described in [15] is used to solve the sparse linear systems arising from 
the fourth order compact discretizations. Red-black and four color Gauss-Seidel relaxations are 
used with the 15 and 19 point compact schemes respectively as the multigrid smoothers. The com- 
putations are terminated when the mean norm of the difference of the successive approximations, 
defined as 


is smaller than 10“ 10 . Here (N - l) 3 is the number of interior grid points (unknowns) and n is 
the number of iterations. The errors reported are the maximum absolute errors over the entire 
discrete grid points. 


A r — 1 N - 1 A r — 1 
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i = 1 j ~ 1 k = 1 
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u ijk ~ u ijk 
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(n - iy 
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Table 1: Comparison of maximum absolute errors for Test 1 with different mesh sizes and Re. 



15 Point Scheme 

19 Point Scheme 

h = 1/32 

h = 1 / 64 

h =1/128 

h = 1/32 

h = 1/64 

h = 1/128 

Re = 0 

4. 02 (-5) 

2.52(-6) 

1.57(-7) 

1.04(-5) 

6.52(-7) 

4.07(-8) 

Re = 1 

3.88(-5) 

2.42(-6) 

1.51(-7) 

1.01 (-5) 

6.28(-7) 

3.93(-8) 

Re = 10 

1.95(-5) 

1.22(-6) 

7.62(-8) 

6.73(-6) 

4.22(-7) 

2.63(-8) 

Re = 100 

1.28(-3) 

8.1 3(-5) 

5. 11 (-6) 

2.72(-4) 

1.22(-5) 

1.08(-6) 

Re = 1000 

1.27(-2) 

9.07(-4) 

5.73(-5) 

2.60(-3) 

1.92 (-4) 

1.29(-5) 


The computations are conducted on an SGI Power Challenge parallel computer with 4 pro- 
cessors and a 512 MB shared memory. The code is written in standard Fortran 77 and is run in 
double precision. Parallelization is achieved by adding parallel derivatives to the loops in (col- 
ored) relaxation and in residual computation subroutines. The interpolation procedure is not 
parallelized. 

Test 1. For the first test problem, the following coefficients are specified for (1) 

p = Re sin ys\n zcosx, q = Re sin x sin z cosy, r = Re sin x sin y cos 2 . 

The computational domain is the unit cube ft = (0, l) 3 . The constant Re represents the magnitude 
of the convection coefficients and simulates the Reynolds number in a flow simulation. The Dirichlet 
boundary conditions and the forcing term / are set to satisfy the exact solution u = cos(4z + 6;i j -f 
8z). 

Table 1 shows the maximum absolute error comparison with the 15 and 19 point compact 
schemes with different mesh sizes and different Re. Note that both schemes are of fourth order 
accuracy, in the fact that the maximum absolute errors decrease approximately by a factor of 16 
when the mesh size is halved. The computed solutions from the ID point compact scheme are 
slightly more accurate than the corresponding solutions from the 15 point compact scheme. The 
difference is about a factor of from slightly larger than 3 to slightly less than 5. This result agrees 
with our truncation error analysis in Section 2. 

The number of multigrid V(l,l) iterations for both schemes is listed in Table 2 with different 
mesh size and different Re. Multigrid grid independent convergence rate is achieved for both 
schemes with Re < 100, although the 19 point compact scheme converged more quickly. The 
convergence rates of both schemes are inversely affected by the magnitude of Re. Note that the 
accuracy of the computed solution from both schemes is inversely affected by the magnitude of Re. 

The CPU time in seconds with multiple processors is compared in Table 3 with different 
mesh sizes and a fixed Re = 10. For reference, the number of iterations in each case is listed in 
the last row. We notice that, with one processor (similar to serial computations), the 15 point 
compact scheme actually took more CPU time to converge (since more iterations are needed for 
convergence), in spite of the fact that it requires less arithmetic operations in each iteration. 
However, as more processors are utilized, the 15 point compact scheme is actually faster than the 
19 point compact scheme. The difference would be larger should we be able to use more processors. 

For large Re, Table 2 indicates a substantial increase in the number of rnultigrid V(l,l) 
iterations. To reduce the number of iterations, a multigrid V(2,2) cycle may be used. The test 
results (CPU seconds) with different mesh sizes and a fixed Re = 1000 are listed in Table 4. We 
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Table 2: Number of multigrid V(l,l) iterations for Test 1 with different mesh sizes and Re. 



15 Point Scheme 

19 Point Scheme 

h = 1/32 

h = 1/64 

h =1/128 

h = 1/32 

h= 1/64 

h =1/128 

Re = 0 

12 

12 

12 

10 

11 

11 

Re = 1 

12 

12 

12 

10 

11 

11 

Re = 10 

12 

12 

12 

11 

11 

11 

Re = 100 

29 

30 

29 

26 

27 

26 

Re = 1000 

51 

57 

71 

52 

56 

66 


Table 3: Multiprocessor CPU time in seconds (multigrid V(l,l)) for solving Test 1 with different 
mesh sizes and Re = 10. 



h = 1/32 

h = 1/64 

h= 1/128 

15 Point 

19 Point 

15 Point 

19 Point 

15 Point 

19 Point 

1 processor 

1.92 

1.92 

16.49 

15.40 

139.34 

129.31 

2 processors 

1.15 

1.17 

9.24 

8.99 

75.99 

71.25 

3 processors 

0.92 

0.87 

6.92 

7.01 

55.88 

53.74 

4 processors 

0.73 

0.71 

5.63 

5.81 

43.61 

44.02 

Iterations 

12 

11 

12 

11 

12 

11 


point out that the speedup with 4 processors for the 15 point compact scheme with h — 1/128 is 
3.21. It is 2.99 for the 19 point compact scheme. Hence the two colorable 15 point compact scheme 
appears to be more scalable. 

Test 2. The convection coefficients of (1) for this test problem are chosen as 

p= Rex(l -y)(l -2z), q = Rey{l - z)(l - 2x), r = Rez(l - x)(l - 2z). 

The Dirichlet boundary conditions and the forcing function are specified to satisfy the exact func- 
tion u — sin irx sin try sin wz. In all calculations for this problem, multigrid V(2,2) cycle iterations 
are used. Table 5 tabulates the maximum absolute errors of the computed solutions from both the 


Table 4: Multiprocessor CPU time in seconds (multigrid V(2,2)) for solving Test 1 with different 
mesh sizes and Re = 1000. 



h = 1/32 

h = 1/64 

h = 1/128 

15 Point 

19 Point 

15 Point 

19 Point 

15 Point 

19 Point 

1 processor 

8.29 

8.87 

82.56 

85.19 

747.24 

703.27 

2 processors 

4.82 

5.23 

45.79 

49.35 

402.78 

385.26 

3 processors 

3.73 

3.73 

34.31 

38.16 

288.71 

288.50 

4 processors 

3.09 

3.03 

27.71 

31.47 

233.15 

235.23 

Iterations 

31 

30 

35 

35 

37 

34 
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Table 5: Comparison of maximum absolute errors for Test 2 with different mesh sizes and Re. 



15 Point Scheme 

19 Point Scheme 

h = 1/32 

h = 1/64 

h = 1/128 

h = 1/32 

h = 1/64 

h = 1/128 

Re — 0 

3.48(-6) 


1.36(-8) 

IBMfiriB 

HggiQBfl 


Re = 1 

3. 43 (-6) 

KMIflil 

1.34(-8) 

| 




3.41 (-6) 

2.13(-7) 

1.33 (-8) 

1.03(-6) 


4.04(-9) 


2.60(-5) 

1.64(-6) 

1.03(-7) 

1.56(-5) 

9.78(-7) 

■a EBB 


4.42(-4) 

2.83(-5) 

■irWgiM 

2.97(-4) 

1.87(-5) 

■■IjgiM 


Table 6: Number of multigrid V(2,2) iterations for Test 2 with different mesh sizes and Re. 



15 Point Scheme 

19 Point Scheme 

h = 1/32 

h = 1 /64 

h = 1/128 

h = 1/32 

h = 1/64 

h = 1/128 

Re = 0 

8 

8 

8 

8 

8 

8 

Re = 1 

8 

8 

8 

8 

8 

8 


9 

9 

9 

8 

9 

8 


14 

14 

14 

16 

16 

16 

Re = 1000 

28 

27 

32 

31 

25 

29 


15 and 19 point compact schemes, when the mesh sizes and Re change. The computed solutions 
from the 15 point compact scheme are very close to those from the 19 point compact scheme, with 
a difference of a factor of approximately 3. The difference is actually smaller when Re is large. 

Increasing the number of relaxation sweeps at each level of the multigrid algorithm helps 
reduce the convergence difference between the 15 and 19 point compact schemes. The results in 
Table 6 show that, in some cases, the 15 point compact scheme converge even faster than the 19 
point compact scheme. 

In Table 7, we compare the parallel efficiency of the two schemes with different Re and a 
fixed h = 1/128 when different number of processors are utilized. The parallel run time is affected 
by the convergence rates of the multigrid iterations, which in turn are affected by the magnitude 
of Re for both schemes. If the convergence rates are the same, the 15 point compact scheme is 
faster. 

6 Conclusion 

We derived a new fourth order compact finite difference scheme for the 3D convection diffusion 
equation with which the computational grid can be decoupled with only two colors, when Gauss- 
Seidel type iterative method is used to solve the resulting sparse linear systems. A parallel imple- 
mentation of a multigrid method is discussed. Numerical experiments are conducted to compare 
the new compact difference scheme with the existing 19 point compact difference scheme. 

Our studies show that the 15 point compact scheme may have the advantage of delivering 
fast solution on parallel computers. Our tests show that with 4 processors, multigrid method with 
the 15 point compact scheme needs less run time than with the 19 point compact scheme, if the 
multigrid convergence rates do not differ a lot. The richer parallelism of the 15 point compact 
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Table 7: Multiprocessor CPU time in seconds (multigrid V(2,‘2)) for solving Test 2 with different 
mesh size h — j 128 and different Re. 



Re = 10 

Re = 100 

Re = 1000 

15 Point 

19 Point 

15 Point 

19 Point 

15 Point 

19 Point 

1 processor 

182.86 

188.20 

305.19 

333.23 

648.63 

604.37 

2 processors 

99.04 

102.19 

165.24 

181.84 

352.26 

328.10 

3 processors 

72.32 

76.31 

112.46 

136.18 

256.83 

245.26 

4 processors 

56.45 

61.46 

94.52 

108.74 

199.67 

197.46 

Iterations 

9 

9 

14 

16 

32 

29 


scheme will be more advantageous if more processors are employed. Another noticeable advantage 
of the 15 point compact scheme is that it uses 21.05% less memory than the 19 point compact 
scheme and requires less computational time per multigrid cycle. 

For certain convection diffusion problems with boundary layers, uniform discretization may 
not be suitable. However, grid transformation can be used to transform a graded grid into a 
uniform grid, on which the fourth order compact scheme can be applied, see [10, 9, 38] for details. 
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Appendixes 

A The 15 Point Fourth Order Compact Difference Scheme 


The 15 point fourth order compact finite difference scheme for Equation (1) can be written as 


26 


aim + aim = Fo, 


( 16 ) 


/=o 


/ — 1 9 


where the stencil coefficients are given as: 

q 0 = —[28 + (pi — pz -f <72 — ^4 + r ,5 — rc)/t -f (p5 + 7o + r o)^~]> 

cii = 4 + -(2po + 3 p\ -f pi - pz + Pa + Ps + Pe)h 

4 
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a 2 = 


3 = 


a 4 = 


«5 = 


a 6 = 


aio — 


Q'iO = 


a 21 = 


a*22 = 


C*23 “ 


0:24 — 


«25 ~ 


«26 — 


+ ^[4po + Po(Pi - Pz) + 7o(P2 - Pa) + r 0 (p 5 - Pe)]* 2 , 

4 + - (2^0 4- <71 -f 3f?2 + <73 ~ <74 + <75 4 

+ £ [ 4 7o + Po(7i - *73 ) + *7o (*7-2 - Qa) + r 0 (q 5 - q 6 )]fi 2 , 

4 — ^ (2p 0 - Pi + P> + 3p 3 +Pa+ P5+ Pe)lt 

+ ^[4Po - Po(Pi - P:;) - 7o(P2 - Pa) - r 0 {pa - p 6 )}h 2 , 

4 — ^(2^0 + 7i — 72 + 73 + 374 + 75 + 7 g)/* 

+ g [ 4 7o ~ Po(7i “ 7s) “ 7o(72 ~ 74) - r 0 (q 5 - q 6 )]fi 2 , 

4 + ^(2r 0 + n + r„> + r 3 + r 4 4- 3r 5 - q 6 )h 

4 

+ g[ 4r o + Po(n - r 3 ) + 7 o(r 2 - r 4 ) + r 0 (r 5 - r 6 )]* 2 , 
4 - ^(2r 0 + n 4- r 2 + r 3 4- r 4 - r 5 4- 3q 6 )h 
+ ^[4ro - Po(n - r 3 ) - q 0 (r- 2 - r 4 ) - r 0 (r 5 - r 6 )]/i 2 , 


| + -j^[ 4 (po + 7o + r 0 ) 4- pi - p 4 + P5 - Pe + 7i - 73 
+ 75 - 76 + n - r 3 + r 2 - r 4 ]h + ^(po7o + Pon> + qor 0 )fi 2 , 


5 - i^ 4 <» 


7o - r 0 ) + Pi - Pa + P5 ~ Pe + q\ - 73 


- 7s + 76 + r\ - r 3 - r- 2 + r 4 \h - -(poTo + Pon> - 7o»~o)* 2 , 
\ ~ "i^ 4 ^ 0 + q °~ r °) “ P2 + Pa + Ps - Pe - 7i + 73 


+ 75 - 76 + n - r 3 + r-i - r 4 \h + -(po7o - Po^o - qar 0 )K 2 , 

1 + -j^[ 4 (po - 70 + r 0 ) - P2 + P4 + P5 - P6 - 7l +73 

- 75 + 76 + r-[ - r 3 - r -2 + r 4 \h - ^(po7o - Po^o + 7oro)* 2 , 

O 

2 + ^ [4(Po + 7o - r 0 ) + pa - Pa - Ps + Pe + 7i - 73 

- 75 + 76 - r\ 4- r 3 - r 2 -I- r 4 ]/i 4- ^(po7o - Po^o - 7oro)* 2 , 

o 

2 ~ -j^[ 4 (Po " ?o + r 0 ) + p> - p 4 - Ps + Pe + 7i - 73 
+ 75 — 76 — r \ + r 3 + r -2 — r 4 ]/i — -(po7o — Po^o + 7o r o)* 2 , 

O 

2 _ “jg t 4 (Po + 7o+ r o) - Pi +Pa - Pr. + P6 - 7i +73 

- 75 + 76 - n + r 3 - r-i + r 4 ]/t 4- — (po7o + Pon> + qor 0 )fi 2 , 

1 1 ( „ 

2 + Jg l 4 (Po - 7o - r 0 ) -Pi+Pa- Ps + Pe - 7 i +73 
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+ 75 - <76 - n 4- r 3 + r-j - r 4 \h - ~{poQo + Por 0 - qor 0 )h 2 , 

F 0 = 2^*^° + Z 1 + + / 3 + /4 + h + faV l ~ 

+ 7bo(/i - h) + 7o(/-2 - A) + r 0 (/s - /e)]^ 3 • 
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